A simplified exactly solvable model for /3-amyloid aggregation 
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We propose an exactly solvable simplified statistical mechanical model for the thermodynamics 
of /3-amyloid aggregation, generalizing a well-studied model for protein folding. The monomer 
concentration is explicitly taken into account as well as a non trivial dependence on the microscopic 
degrees of freedom of the single peptide chain, both in the a-helix folded isolated state and in the 
fibrillar one. The phase diagram of the model is studied and compared to the outcome of fibril 
formation experiments which is qualitatively reproduced. 
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Amyloids are insoluble fibrillar aggregates of proteins, 
stabilized mostly by hydrogen bonds and hydrophobic 
interactions. They are implicated in debilitating human 
pathologies, such as Alzheimer's, Parkinson's disease 
and spongiform encephalopathies. Citotoxic species have 
been recently identified with transient soluble oligomeric 
structures whereas amyloid fibrils are believed to be the 
final most stable state of the aggregation process [l[ . Vir- 
tually all proteins can be induced to adopt the amyloid 
structure upon appropriate conditions 

A common signature of fibril formation is the pres- 
ence of a stable core of cross-/3 structure, with /3-strands 
running orthogonal to the fibril axis and forming sev- 
eral /3-sheets which may intertwine along the latter. The 
cross-/? structure is identified through its typical X-rays 
diffraction pattern and binding to specific fluorescent 
dyes. More sophisticated techniques, such as solid state 
NMR, are needed in order to provide structural models 
at atomic level. In a few known cases, for intermediate 
chain lengths in between 20 and 40, all peptide monomers 
may adopt a repeating hairpin structure within the fibril- 
lar aggregate [1, 3 ■ This is then stabilized by interchain 
hydrogen bonds between the same residues in different 
chains, leading to the so-called parallel in-register ar- 
rangement [^, [y| shown in Fig. [TJ 

The conformational ensembles populated at low con- 
centration by proteins, which aggregate into amyloid fib- 
rils at higher density, may vary from the large amount of 
fluctuating structures of natively unfolded proteins and 
peptides, such as the A/3-peptide related to Alzheimer's, 
to the well deflned structures of globular proteins. In the 
latter case, the competition between the stability of the 
native structure and of the amyloid flbrils is crucial in 
determining the amyloidogenic behavior (3] . 

In the context of protein folding, simple models based 
on the geometry of the native structure have been very 
useful in unraveling folding kinetics. In the same man- 
ner, one can speculate that the geometry of the fibrillar 
aggregate, as typified by the parallel in-register hairpin 
structure, may play a similar role in aggregation kinet- 
ics. Within this spirit, the competition described above 
for the aggregation of globular proteins becomes a com- 



petition between two alternative geometries, which needs 
to be assessed already at equilibrium. 

The purpose of the present Letter is proposing a simpli- 
fied statistical mechanical model for /3-amyloid aggrega- 
tion, generalizing a well-studied one for protein folding. 
Our model explicitly depends on protein concentration 
and has the virtue of being exactly solvable. For more 
realistic descriptions, even at a coarse-grained level, the 
computational cost of achieving thermodynamic equilib- 
rium at different concentrations is prohibitive. On the 
other hand, here we consider a non trivial dependence 
on the microscopic degrees of freedom of the single pep- 
tide chain, both in the folded and in the fibrillar state. 
Other simplified models describe monomers through just 
a few macrostates Notably, we succeed in repro- 

ducing, at least qualitatively, the behavior of fibril for- 
mation experiments in the presence of the denaturant 
trifluoroethanol (TFE) in different concentrations. 

Our model starts from the one introduced by Wako and 
Saito [III and then reconsidered by Munoz, Eaton 
and co-workers fisl - fist (WSME-model). The latter has 
been the subject of many works with applications to real 
proteins |16l - l22l |. Despite its simplicity, it has been able 
to capture the main features of the kinetic behavior and 
folding pathways of speciflc molecules. 

The WSME-model is a highly simplifled model of the 
protein folding process built on the premise that the lat- 
ter is mainly determined by the structure of the native 
functional state, whose knowledge is assumed. Only na- 
tive interactions are included, classifying the model as 
Go-like [2^. Moreover, the interaction between two 
aminoacids in the protein sequence is possible only if all 
intervening peptide bonds are in their native conforma- 
tion. The entropy loss due to flxing peptide units in this 
conformation is flnally explicitly taken into account. 

Within this framework, a polypeptide chain made up 
of -I- 1 aminoacids is described as a sequence of N pep- 
tide bonds. Two conformations are considered for each 
bond: the native one and a generic disorder state. Thus, 
a binary variable m.^ is associated to the i-th peptide unit, 
taking value 1 and in the two cases respectively, and 
the free energy F of the model can be written in unit of 
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Figure 1: In-register parallel /3-hairpins in A/340 peptide 
structural model 3] (top) and schematic representation of 
two interacting monomers in our model (bottom). Dots 
correspond to aminoacids, horizontal and vertical segments 
are ordered peptide bonds and tilted segments are unstruc- 
tured ones. Dashed lines represent contacts between the two 
monomers. See text for details. 

ksT, with T the absolute temperature, as 

JV-l N j N 

i—1 k—i i—1 

The contact matrix, with entry equal to 1 if the i-th 
and j-th bonds are close to each other in the native struc- 
ture and equal to otherwise, tell us which are the native 
interactions. Their energetic amount is then quantified 
by the dimensionless contact energy eij < 0, referring to 
the i-th and j-th peptide units. This contributes to the 
free energy only if the product n'fc=i does not vanish, 
that is only if such two bonds are the ends of a sequence 
of ordered peptide units, thus realizing the depicted inter- 
action. Finally, recognizing the microscopic multiplicity 
of an abstract disorder state, an entropic cost > is 
given to the ordering of the i-th peptide bond. 

Our model is an extension of the WSME-model, suit- 
able for the thermodynamics of /3-amyloid aggregation. 
The basic idea is that peptide monomers can either fold 
into their native structure or partially lose this feature be- 
fore aggregating in fibrils. Here we focus, for simplicity, 
on Qf-helices while the aggregation is assumed to require a 
hairpin shape and proceed by parallel in-register arrange- 
ment as in Fig. [Tl thus mimicking real fibrils. Other in- 
register amyloid structures with more than two /3-layers 
j24| could be also implemented. 

We will define the model in a bottom-up approach. 
Let us begin introducing the free energy of isolated 
monomers, which can fold into a-helix native structure. 
In such a structure an hydrogen bond is formed between 
peptide units i and j so that |i — j| 3. Then, for a 
homogeneous molecule with an odd number of peptide 



bonds, 2B + 1, following Eq. ([T]) we choose the free en- 
ergy as 

2B-2i+3 2B+1 

i^(m,0) = -e« ^ nmfc-g ^ (1-771,), (2) 

i—1 k—i i—1 

because Ay = 1 if |i — j| = 3 and otherwise. The 
dimensionless parameters Eq, > and q > account re- 
spectively for the energy strength of each contact and the 
entropic cost of ordering each bond. 

As far as the interaction between different peptides is 
concerned, we assume that aggregation involves and re- 
quires a partial /3-hairpin shape, which is obtained by re- 
moving some helical contacts. In such a view, the small 
loop region of the hairpin formed by a monomer with 
2B + 1 peptide units is identified with the peptide bond 
-B + 1, from which two strands depart as shown in Fig. [1] 
Fibril formation is triggered by pairing a part of the or- 
dered fragments of the two strands from one molecule 
with the same part of another. A measure of the "/3- 
order" extent associated to a pair of consecutive (3- 
hairpins, with WSME-variables m and m' , is provided 
by B{'m, m') = nfe^i"^ ' rnk-m'^. and vanishes if loop 

regions are not both ordered. Otherwise, it is the com- 
mon number of ordered peptide units facing each other 
beginning from loops. For the case shown in Fig. [U we 
have B{m, m') — 2. 

We can then interpret the aggregation phenomenon, 
which is driven and stabilized by hydrogen bonds, as 
the formation of 2c contacts between the /3-portions of 
the two different monomers, where c has thus to be in 
between and B{m,m'). We assume that the pairing 
between different peptides starts from loops and go on 
sequentially along the strands, suggesting the idea that 
these regions, having the same shape, are the most suit- 
able to initiate the aggregation. In equilibrium condi- 
tions this mechanism corresponds to assume that there 
is only one way to form the above 2c contacts. In Fig. [1] 
all available interactions of this kind are present. 

A segment can gain energy being cither in a helical 
state and unbounded by other peptides or in a /3-hairpin 
state and bounded to another hairpin. We assume that, 
if a molecule binds another one with 2c > hydrogen 
bonds, then the helical contacts including peptide bonds 
participating to the pairing, that is in the stretch going 
from B — c + ltoB + c+1, are suppressed. Hence, the 
free energy of that monomer becomes 

min{B+c+l,2B-2} i+3 

F{m,c)=Fim,0) + e^ J2 Y[rnk, (3) 

i=max{S — c— 2,1} k=i 

being F(m, 0) the free energy of the isolated a-helix de- 
fined by Eq. 12]). In turn, we shall denote by > the 
energetic gain, in unit of ksT, of one contact between 
different monomers. 
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Figure 2: Phase diagram of tiie model in the plane {ea,efi) 
at p = 0.7 and cr = 5, being a the entropy loss associated to 
the aggregation. Here B = 10 corresponding to peptides of 21 
monomers. White, light -gray and gray regions are helix, fibril 
and unfolded phases respectively. Top inset depicts the order 
parameters Pa and pp as a function of the TFE concentration 
M while the bottom one shows the phase diagram in the plane 
{M,p). See text for details. 



Now we take into account the translational and ro- 
tational entropy loss S{c) > due to the aggregation 
of different peptides with the formation of 2c hydrogen 
bonds. We will choose S so that 5'(0) = 0. At last, the 
free energy for a system of two close molecules that can 
aggregate takes the form F{m, c) + F{m', c) — 2e^c+S'(c), 
with the constraint c < B{m, m'). 

We want to stress that the /3-hairpin shape is not 
needed a priori in order to have aggregation between 
different monomers, but it is rather considered as a 
concomitant event to the matching process. Moreover, 
the requirement of ordered stretches of peptide units to 
form contacts between molecules is just a way to ex- 
press that only few chain conformations are suitable for 
aggregation. Intra-helix contacts represent general na- 
tive interactions protecting isolated conformers from the 
aggregation-prone states [7| . 

Finally, we model the formation of an aggregate as a 
growth of a "one-dimensional structure" . To this aim, we 
describe a system of many peptides by placing them on 
distinct sites /, / = 1,2,...,L, ofa one-dimensional lat- 
tice and including in the model only interactions between 
nearest-neighbor molecules. The occupation number ni 
of site Ms 1 if a monomer is present in that position 
and otherwise. Furthermore, to each site we associate 
WSME-variables describing the conformation of the pep- 
tide chain placed there and thus mi^i will give the state 
of the i-th peptide unit of the molecule at I. In order to 



avoid an unphysical entropic contribution, we set mi,i at 
for any i if n; = 0. For simplicity, the symbol to/ will 
be used for the array (to/^i, to;. 2, ■ • ■ , Tni,2B+i) of binary 
variables related to the node L to for all these variables 
and n for the collection of occupation numbers. Finally, 
we need the variable c; keeping count of the contacts be- 
tween close molecules residing at nodes I and I + 1. This 
variable ranges from to B{rni,mi+i) and, as expected, 
no interaction is possible between sites I and I + 1 when 
they are not both occupied. 

The free energy of the full model is then a gener- 
alization of the one introduced above for two monomers. 
Using the dummy variables cq = and cl = and notic- 
ing that the number of peptide units of the molecule at 
site I involved in contacts with other molecules is prop- 
erly related to max{c/_i, q}, this free energy reads 



HL{n,m,c) = n;F(TO;, max{Q-i, q}) 
1=1 

L-l 

- E 



2e^c/ - S{ci) 



^^nj. (4) 

1=1 



The contribution of the chemical potential /i, which 
will be determined by imposing a given value p to the 
monomer density, has been here included. 

The Boltzmann distribution with the free energy of 
Eq. (ID) provides the possibility to evaluate equilibrium 
expectation values of physical observables. The present 
model can be solved exactly by means of a transfer matrix 
method, because of the presence of short-range interac- 
tions in a 1-dimensional system and the possibility of ex- 
actly tracing on the WSME-variables. Details are shown 
in the supplementary material (25| . Here we restrict to 
some results on the behavior of two order parameters 
related to the fraction of isolated helices and aggregated 
molecules. The former, pa [l^, measures the global order 
of peptides when they do not interact at all and is de- 
fined as the equilibrium average of the fraction of native 
bonds per site, normalized to the density p, considering 
only microscopic configurations which exclude aggrega- 
tion phenomena. The latter, [25j, accounts for bonds 
between different monomers and is given by the fraction 
of formed contacts between two consecutive lattice nodes, 
again normalized with respect to p. 

Fig. [2] shows the phase diagram of the model in the 
thermodynamic limit L — 00, where different phases are 
separated by the conditions = 1/2 and = 1/2. Here 
we choose B = IQ and q = 2 but other values of q only 
marginally affect this diagram. Moreover, we consider 
the case S{0) = and S{c) = a > independent of c if 
c = 1,2, ... ,B, assuming that most of the entropy loss 
in aggregation is due to the formation of just one contact 
between monomers. Parameters Cq, and are referred to 
the midpoint em ^ 2.35, depending on both B and q, of 
an helix in the pure WSME-model. The energy scale e™ 
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may be obtained by imposing pa = 1/2 when the density 
p approaches 0. 

Three regions are recognized. The first is the region 
of unfolded isolated peptides where both pa and pp are 
less than 1/2. For small the order parameter p^ is 
closed to the value 1/(1 + e'^) ~ 0.12 of a completely 
unfolded structure in the WSME-model. The second 
region, where Pa > 1/2 and pp < 1/2, corresponds to 
native isolated helical peptides. The boundary between 
the unfolded region and the a-helix region is weakly de- 
pending on both a and p and almost coincides with the 
one obtainable in the plain WSME-model for the same 
helices. Finally, there is the fibril region where pa < 1/2 
and pp > 1/2. Increasing p or decreasing a favors the ag- 
gregation by lowering the boundary between this region 
and the denatured one. 

Since the energetic parameters Ca and ep are effective 
parameters mediated by solvent, we may expect them to 
vary in a non trivial way as external conditions, such as 
temperature, different denaturant concentrations, solu- 
tion ionic strength and pH, are changed. The denaturant 
agent TFE is commonly used in fibril formation assays 
because, at moderate concentrations, it disrupts the na- 
tive structures of isolated proteins, without preventing 
the formation of inter-molecular contacts [26[. At high 
concentrations, TFE addition results in the stabilization 
of isolated unfolded proteins [2^ . 

We can mimic the TFE effect by assuming that both 
Ca and Cjs are simple linear decreasing functions of its 
concentration M, with decreasing more than ep. For 
example, by moving along the straight line in Fig. [2] from 
H at M = to U at A/ = 1, the observed native-fibril- 
unfolded pattern (26| can be qualitatively reproduced. 
Given such a dependence of and ep on M, the top in- 
set in Fig. [2] depicts the profile oi pa and pp as a function 
of the TFE concentration whereas the bottom one re- 
ports the phase diagram of the model in the plane (M, p) . 
The pattern discussed above is present for high values of 
peptide density, with the fibril stability interval in TFE 
concentration narrowing with decreasing peptide density. 
At low density the fibril phase is not present anymore and 
the peptides remain always isolated going directly from 
the native to the unfolded state, with increasing TFE 
concentration. 

In summary, in this Letter we have proposed a highly 
simplified equilibrium model to describe the aggregation 
of identical monomers and the consequent formation of 
fibrillar structures. Despite its simplicity, the model has 
been shown to explain different phases of the system, 
such as unfolded and aggregated states, and to repro- 
duce qualitatively the observed trend of fibril formation 
experiment as a function of trifluoroethanol concentra- 
tion. Moreover, we argue that a kinetic version of the 
model could shed new light on the protein aggregation 



dynamics and work is in progress along this line. 
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